SPECIFICATION 



TO ALL WHOM IT MAY CONCERN: 

BE IT KNOWN THAT WE, Tomohiro Yam.ana, a citizen of 
Japan residing at Kawasaki, Japan, Shinichiro Tago, a 
citizen of Japan residing at Kawasaki, Japan, Taizoh Satoh, 
a citizen of Japan residing at Kawasaki, Japan, Yoshimasa 
Takebe, a citizen of Japan residing at Kawasaki, Japan and 
Yasuhiro Yamazaki, a citizen of Japan residing at Kawasaki, 
Japan have invented certain new and useful improvements in 

INFORMATION PROCESSING DEVICE AND COMPUTER SYSTEM 

of which the following is a specification : - 



TITLE OF THE INVENTION 

INFORMATION PROCESSING DEVICE AND COMPUTER 

SYSTEM 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

The present Invention relates to an 
Information processing device and particularly 
relates to a parallel-processing processor which can 
execute Very Long Instruction Words (VLIW) . 

Due to recent requirements for a higher 
processing speed of computing systems. It Is 
required to Increase number of Instructions 
executable In parallel In a single clock cycle for 
each processor. To meet such a requirement, systems 
utilizing Very Long Instruction Words or VLIW 
Instructions have been proposed. However, such 
systems have a problem that for some program, a 
considerable number of nops (no operation). I.e., 
Instructions which do not need Interpretation or 
execution thereof, are stored In a real memory. 
When there are a considerable number of nop 
Instructions, more space In an Instruction cache or 
a main memory becomes unusable. This may give rise 
to a reduction In performance. Accordingly, for 
VLIW systems, there Is a need for a device which can 
store Instructions executable In parallel Into a 
real memory without adding any nop Instruction, 
expand them Into a group of Instructions executable 
In parallel and supply the group of Instructions 
executable In parallel. 

2 . Description of the Related Art 
Firstly, referring to Figs. 1 and 2, a 

computing system using VLIW Instructions will be 
described. 

Fig. 1 shows a configuration of a 
parallel-processing processor of the related art. 
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As shown in Fig. 1, the parallel-processing 
processor 10 of the related art includes an 
instruction read-out part 1 connected to a memory 7 , 
an instruction submitting part 3 connected to the 
5 instruction read-out part 1 , instruction executing 
parts EUO-EUn each connected to the instruction 
submitting part 3, and a register part 5 connected 
to all of the instruction executing parts EUO-EUn. 
The instruction read-out part 1 reads out 
10 instruction words from the memory 7 and supplies the 
instruction words to the instruction submitting part 
3. 

The instruction submitting part 3 submits 
elementary instructions in the supplied instruction 
J^' 15 words to the instruction executing parts EUO-EUn. 

IQ If the instruction executing parts EUO-EUn are 

already executing preceding elementary instructions, 
the instruction submitting part 3 waits for the 
g preceding instructions to complete, and then 

p 2 0 supplies next elementary instructions to the 

fU relevant instruction executing parts . The 

13 instruction executing parts EUO-EUn execute the 

supplied elementary instructions . Upon completion 
of executions, the instruction executing parts EUO- 

2 5 EUn inform the instruction submitting part 3 that 
the executions have been completed. 

The register part 5 supplies data to the 
instruction execution parts EUO-EUn if necessary, 
and holds the result of execution at each of the 
30 instruction executing parts EUO-EUn. Also, the 

externally connected memory 7 stores a sequence of 
instruction words to be executed on the parallel- 
processing processor 10 and also stores data 
required by the instruction execution parts EUO-EUn 

3 5 for executing instructions and data obtained as a 
result of the execution. 

Fig. 2 is a diagram showing a format of an 
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instruction word supplied to the above-mentioned 
parallel-processing processor having four 
instruction-executing parts EUO-EUn. As shown in 
Fig* 2, the instruction word includes elementary 
5 instructions EI and non-operational instructions nop, 
It can be seen that the ratio of the non-operational 
instructions considerably high when the number of 
elementary instructions included in a single 
instruction word and executed in parallel is less 
10 than the number of instruction executing parts EUO- 
EUn • 

In other words , with a system for 
parallel-processing a plurality of elementary 
instructions using the prior art VLIW, since the 
15 instruction words have fixed length, when the number 
CO of instruction words to be executed in parallel is 

lil less than a predetermined number, non-operational 

fn instructions must be added to the VLIW to provide a 

predetermined length. Therefore, for a program 
1^ 2 0 having a low degree of parallel processing of the 

fU instructions, an increase in the number of non- 

operational instructions causes an increase of the 
amount of instruction code. This causes problems 
such as a reduced usage efficiency of the memory, a 

2 5 reduced hit ratio of the cache memory and an 
increase in the load of the instruction read-out 
mechanism. 

Some methods have been proposed to 
overcome such problems. For example, it is known to 

3 0 provide infoxrmation in the instruction to indicate 
whether it is an nop instruction and to prohibit the 
operation of an operation part if it is a nop 
instruction (e.g., Japanese laid-open patent no. 8- 
161169) . Another known method is to make use of 

3 5 variable length VLIW instructions to omit nop parts 
and to expand VLIW instruction including only 
executable instructions and assigning them among the 
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operation parts (e.g., Japanese laid-open patent no. 
11-281957 by the Applicant) . 

However, further improvements are possible 
with such known methods for efficiently reading out 
5 the VLIW instructions and for efficiently assigning 
them among the operation parts execute operations . 

In detail , the instruction submitting part 
3 of the prior art shown in Fig. 1 can hold n 
instructions supplied from the read-out part 1. 
10 That is to say, the instruction submitting part 3 
has a IXn (1-row x n-column) configuration. 
Instruction submitting part having such 
P configuration is adopted for configurations 

described in the above-mentioned Japanese laid-open 
Iq 15 patent nos. 8-161169 and 11-281957. However, this 

Co is inefficient since the instruction submitting part 

in 

3 has a IXn configuration and the VLIW instructions 
en must be frequently read out from the memory 7. Also, 

^ since the VLIW instructions have a variable length, 

1^ 2 0 it is not guaranteed that all the VLIW instructions 

fU would be stored in the instruction submitting part 3. 

That is to say, there may be a case where only a 
part of a VLIW instruction is transferred to the 
instruction submitting part 3 and the remaining part 
2 5 remains in the memory 7. In such a case, the VLIW 
instruction cannot be executed until the remaining 
part of the VLIW instruction is read out from the 
memory 7 and transferred to the instruction 
submitting part 3 . 
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SUMMARY OF THE INVENTION 



Accordingly, it is a general object of the 
present invention to provide an information 
processing device and a computing system which can 
3 5 obviate the above-mentioned prior art problems. 

It is another and more specific object of 
the present invention to provide an information 
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processing devic and a computing systiem which can 
read out instructions such as VLIW instructions and 
distribute them to operation parts in an efficient 
manner . 

5 In order to achieve the above-mentioned 

objects, the information device of the present 
invention an information processing device includes: 
an mXn (m-row x n-column) instruction 

buffer ; 

10 a plurality of instruction executing parts 

executing a plurality of instructions in parallel; 
and 

a control circuit selecting a 
predetermined number of instructions from the mXn 
15 instruction buffer and distributing the instructions 
to the instruction executing parts . 

With the information processing device and 
a computing system described above r instructions 
such as VLIW instructions can be read out and 



^ 2 0 distributed to operation parts in an efficient 



fU manner. 



BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram showing a 

2 5 parallel processing processor of the related art. 

Fig. 2 is a diagram showing a format of an 
instruction word supplied to the parallel processing 
process shown in Fig. 1 having four instruction 
executing parts . 
30 Fig. 3 is a block diagram showing a 

parallel processing processor of a first embodiment 
of the present invention. 

Fig. 4 is a diagram illustrating 
instructions used in the present invention. 

3 5 Fig, 5 is a detailed block diagram of the 

parallel processing processor of Fig. 3. 

Fig. 6 is a block diagram showing a 



variant of the parallel processing processor of a 
first embodiment of the present invention. 

Fig. 7 is a block diagram showing a 
parallel processing processor of a second embodiment 
of the present invention. 

Fig. 8 is a block diagram showing a 
configuration of a parallel processing processor of 
an embodiment of the present invention and a 
computing system including thereof. 

Fig. 9 is a flowchart showing an operation 
of the controller shown in Fig. 8. 

Fig. 10 is a diagram showing an 
arrangement of instructions used for explaining an 
example of an operation of the parallel processor 
shown in Fig. 8. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

In the following, principles and 
embodiments of the present invention will be 
described with reference to the accompanying 
drawings . 

Fig. 3 is a block diagram showing a 
parallel processing processor of a first embodiment 
of the present invention. The parallel processor 
shown in Fig. 3 includes a first instruction buffer 
21, a first selection circuit 22, a second 
instruction buffer 23, a second selection circuit 24 
and groups operation parts 25, 26 and 27, each group 
including two operation parts . 

The first instruction buffer 21 has an mX 
n (m-row x n-column) configuration (m and n are each 
an arbitrary integer.) An example shown in Fig. 3 
has a 3X6 configuration. The first instruction 
buffer 21 receives VLIW instructions of a variable 
length which are read out from a memory 28 described 
later with reference to Fig. 4, such that VLIW 
instructions are received as a group of n VLIW 
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instructions and expanded to an mXn (m-row x n- 
column) configuration. H re, a row with n columns 
is defined as a block. Also, a portion holding a 
single instruction (an instruction portion and 
5 portions for holding the above-mentioned first and 
second information) is referred to as a unit buffer. 
Accordingly, the first instruction buffer 21 
includes 18 (=3X6) unit buffers. 

Referring now to Fig. 4, an instruction 
10 used in the present embodiment will be described. 
Fig. 4 shows a memory 2 8 used in the present 
embodiment and a plurality of instructions stored 
therein. The memory shown in Fig. 1 is indicated 
with reference numeral 7 . The variable-length 
15 instructions used in the configuration of the 
eg present embodiment are each provided with a flag 29 

[^l which is information showing whether the relevant 

instruction is simultaneously executable. The flag 

^ 29 takes a value "0" or "1". For example, 

M 

2 0 instruction 1 is provided with the flag 29 having a 
fy value "0" and instruction 4 is provided with the 

?3 flag 29 having a value "1". The value "0" indicates 

that the instruction is a simultaneously executable 
instruction and the value "1" shows that the 

2 5 instruction is not an instruction which is 
simultaneously executable with an immediately 
succeeding instruction. In the example of Fig. 4, 
instructions 1 to 4 are simultaneously executable 
instructions. Since the flag 29 of instruction 4 

3 0 takes a value "1", instruction 4 cannot be 
simultaneously executed with instruction 5. Since 
the flag 29 of instruction 5 takes a value "1", 
instruction 5 cannot be simultaneously executed with 
instruction 6 . 

3 5 When the memory 28 storing such a group of 

instructions is expanded two-dimensionally according 
the prior art method using nops, it giv s a 
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conf iguration as shown by reference numeral 28A in 
Fig. 4. One row includes four instructions and each 
row forms a single VLIW instruction. The 
instructions are stored in such a manner that four 
5 instructions are stored in the bottom row from left 
to right, and then four instructions are stored in 
the second row from the left to right. An 
instruction address shown in Fig. 4 is incremented 
by four per a single instruction. Each flag 29 
10 added to the last instructions of each VLIW 

instruction takes a value "1". That is to say, the 
flag 29 is information showing a boundary between 
□ the VLIW instructions. (The flag 29 is referred to 

as a packing flag) . 
15 Instructions are read out from the memory 

CO 28 of Fig. 4 and are written in the first 

instruction buffer 21 shown in Fig. 3 in a block-by- 
5P block manner (i.e., six instructions at a time) 

2 sequentially from the bottom left. Digits "0" and 

1^ 2 0 "1" shown in Fig. 3 indicates above-mentioned 

fU packing flags. Note that in Fig. 3, labels 

indicating individual instruction in Fig. 4, such as 
instruction 1, are omitted. 

Also, in Fig. 3, each unit buffer is 

2 5 labeled with symbols "A", "B" and "C" . These 
symbols are second information applied to each of 
the instructions (note that the first information is 
a packing flag) . The second information shows a 
type of instruction. In other words, it can be said 

3 0 that the second information shows one of the groups 
of operation parts 25 to 27 to be processed. The 
operation parts 25, 26 and 27 execute operations A, 
B, and C, respectively. For example, operations A, 
B and C are an integer number operation, a floating 

3 5 point operation and a branch instruction operation, 
respectively . 

In the example shown in Fig. 3, 
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instructions are written into the bottom row of the 
first instruction buffer from the left in a 
sequential manner. The first two unit buffers are 
empty and there are no instruction written in them. 
5 Subsequently, there are four instructions with flags 
"0" having instruction types "B" , "A, "B" and "C" , 
respectively. In the second bottom row (middle row) , 
instructions having flags with values "0", "1", "0", 
"1", "0" and "0" and having instruction types "A", 
10 "C", "B", "C", "A" and "C" are written from the left. 
In the top row, instructions having flags with 
values "1", "0", "0", "1", "0" and "1" and having 

C3 instruction types "C", "B" , "A", "C" , "A", "B" are 

written from the left. 

fQ 15 As has been described above, the packing 

tQ flags indicate the boundaries of the VLIW 

in 

I J instruction. Therefore, in the example show in Fig, 

{n 3, the first VLIW instruction includes six 

^ instructions of instruction types "B", "A", "B", "C", 

[I 2 0 "A" and "C" , the second VLIW instruction includes 

fU two instructions of instruction types "B" and "C" , 

^3 and the third VLIW instruction includes three 

instructions of instruction types "A", "C" and "C", 
respectively. (Here, fourth and further VLIW 

2 5 instructions are not explained in detail) . It is to 

be noted that the first VLIW instruction extends 
over two blocks (from the bottom row to the center 
row) , and the third VLIW instruction also extends 
over two blocks (from the center rows to the top 

3 0 row) . With the configuration of the present 

embodiment, the VLIW instructions extending over a 
plurality of rows (blocks) can be read out from the 
first instruction buffer in a single read operation. 
Since the prior art configuration is a 1X6 (1-row x 
3 5 6-column) configuration, in order to process the 

above-describe VLIW, there is a restriction that a 
VLIW instruction must be complete within one row. 



Therefore, an efficient processing is cannot be 
implemented. 

The first selection circuit 22 includes 
six selectors each having three inputs and one 
output (3 to 1 selectors) . Fig. 5 is a detailed 
diagram of the configuration shown in Fig. 3. The 
first selection circuit 22 includes six selectors 
(sel) each having three inputs and one output (3 to 
1 selectors) 22i to 226. Each selector 22i to 226 
receives three instructions read out from three unit 
buffers of a corresponding column and selects one of 
them. For example, the selectors 22i to 226 each 
selects information read out from unit buffers 
accompanied by * , Selection is carried out by 
referring to the values of the packing flags. 
Selection control will be described later. 

The selected instructions are held in the 
second instruction buffer 23. The second 
instruction buffer 23 has IXn (in the present 
embodiment, 6 columns) configuration. That is to 
say, as shown in Fig. 5, the second instruction 
buffer 23 includes six unit buffers 23i to 236. Fig. 
5 shows how the selected instructions are read out 
from the unit buffers accompanied by * and the 
selected instructions are stored. The output of 
each unit buffers 23i to 236 supplied to the 
second selection circuit 24. 

As shown in Fig, 5, the second selection 
circuit 24 includes six selectors (sel) having six 
inputs and one output (6 to 1 selectors) 24i to 246 • 
The six inputs of each selector 24i to 246 are 
connected to the outputs of six unit buffers 23i to 
236. The outputs of the selectors 24i and 242 are 
connected to the group of operation parts 25. The 
outputs of the selectors 243 and 244 are connected 
to the group of operation parts 26. The outputs of 
the sel ctors 245 and 246 are connected to the group 
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of operat:ion part:s 27. 

The second select:xon circuit 24 implem nts 
a selection operation by referring to the 
instruction types "A", "B" and "C". For example, 
5 the selectors 24i and 242 connected to the group of 
operation parts 25 of type "A" select the unit 
buffers 233 and 234 holding instructions of type "A" 
Similarly, the selectors 243 and 244 connected to 
the group of operation parts 26 of type "B" select 
10 the unit buffers 235 and 236 holding instructions of 
type "B" . 

Each of the groups of operation parts 25 
to 27 is provided with two pipeline slots (operation 
parts) . The group of operation parts 25 is provided 

'Z 15 with pipeline slots Al and A2 , the group of 

Cy 

IQ operation parts 26 is provided with pipeline slots 

Bl and B2 , and the group of operation parts 27 is 
provided with pipeline slots CI and C2 • 
E As has been described above, the VLIW 

p 2 0 instruction extending over a plurality of blocks can 

ry be simultaneously supplied to the groups of 

operation parts 25 to 27. 

Accordingly, summarizing the first 
embodiment of the present invention, a parallel 

2 5 processing processor is provided which includes an 
information buffer 21 of mXn (m-row x n-column) 
configuration, instruction executing parts 25 to 27 
for executing a plurality of instructions in 
parallel, a control circuit (a circuit including a 

30 first selection circuit 22, a second information 
buffer 23 and a second selection circuit 24) for 
selecting a predetermined number of instructions 
from the instruction buffer of mXn (m-row x n- 
column) and distributing them to the instruction 

3 5 executing parts. 

Fig, 6 is a block diagram showing a 

variant of the parallel processing processor of a 



first embodiment of the present invention. This 
variant is characterized in that a second selection 
circuit 124 shown in Fig. 6 is used instead of the 
second selection circuit 24 shown in Fig, 5. The 
second selection circuit 124 includes six selectors 
(sel) each having three inputs and one output (3 to 
1 selectors) 124i to 1246. Each one of the 
selectors 124i to 1246 is connected to the 
predetermined three unit buffers of the second 
instruction buffer 23. For example, the selectors 
124i, 1243, and 1245 are connected to the outputs of 
the unit buffers 23i, 23$ and 235, respectively. Th 
selectors 1242, 1244, and 1246 are connected to the 
outputs of the unit buffers 232 r 234 and 236 r 
respectively . 

The configuration shown in Fig. 6 
corresponds to a case where a VLIW instruction that 
matches one of instructions in the sequence of 
predetermined simultaneously executable instructions 
is supplied from first instruction buffer 21 to the 
first selection circuit 22 and then outputted to the 
second instruction buffer 23. The sequence of 
predetermined simultaneously executable instructions 
implies that when there are two instructions of the 
same type, these must be arranged in a continuous 
manner . 

Further, a configuration is considered in 
which the selectors 124i and 1242 are directly 
connected to the pipeline slots Al and A2 of the 
group of operation parts 25, respectively, the 
selectors 1243 and 1244 are directly connected to 
the pipeline slots Bl and B2 of the group of 
operation parts 26, respectively, and the selectors 
1245 and 1246 are directly connected to the pipeline 
slots CI and C2 of the group of operation parts 27, 
respectively. Then, the unit buffers 23i, 233 and 
235 accompanied by odd reference numerals are 



processed by the pipeline slots Al, Bl and CI of odd 
reference numeral and the unit buff rs 232, 234 and 
236 accompanied by even reference numerals are 
processed by of pipeline slots A2, B2 and C2 of even 
reference numeral. That is to say, the VLIW 
instructions stored in the second instruction buffer 
23 must meet such a condition (or rule) . 

In the configuration shown in Fig. 6, 
instructions are arranged in the memory 2 8 as a 
seqpjence of predetermined simultaneously executable 
instructions. For example, such an arrangement of 
instructions follows constraints that up to two 
instructions of types "A", "B" and "C" can exist in 
the VLIW instruction and if there are two 
instructions, instructions of the same type cannot 
be arranged at neighboring positions. In stead of 
such constraints, when reading out the instructions 
stored in the first instruction buffer 21, a 
controller described below may refer to the 
information indicating the type of instruction to 
determine whether it is a sequence of predetermined 
simultaneously executable instructions. Only when 
it is determined as a sequence of predetermined 
simultaneously executable instruction, the 
instruction is selected and transferred to the 
second instruction buffer 23. 

Fig. 7 is a block diagram showing a 
parallel processing processor of a second embodiment 
of the present invention. The configuration of the 
present embodiment is characterized in that in 
addition to a limitation on a sequence of 
predetermined simultaneously executable instructions , 
there is a limitation of a number of predetermined 
simultaneously executable instructions for 
processing the VLIW instructions. 

The present embodiment includes a first 
instruction buffer 31 of a 3X4 (3-row x 4-column) 
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configuration, a first selection circuit 32, a 
second instruction buffer 33 of a 1X4 (1-row x 4- 
column) configuration, a second selection circuit 34, 
and three groups of operation parts 25 to 27. The 
5 first selection circuit 32 includes four selectors 
having three inputs and one output (3 to 1 
selectors) . Each selector selects instructions read 
out from three unit buffers of the corresponding 
column and outputs them to the corresponding unit 
10 buffer of the second instruction buffer 33, The 

selection operation is carried out by referring to 
the packing flag added to each instruction. 
f3 The second instruction buffer 33 includes 

four unit buffers. That is to say, with the second 
15 embodiment, it is possible to deal with variable 
Iq length VLIW instructions each including up to four 

instructions, i.e., VLIW having a maximum variable 
length of four. In general, the configuration of 
a the second embodiment cannot deal with more than 

2 0 four VLIW instructions and, therefore, it is a 

\^ 

fU configuration for a case where an upper limit of 

t3 four is set as a number of simultaneously executable 

f ^ 

J f instructions . 

With the second embodiment, instructions 

2 5 are arranged in the memory 28 such that there are no 

more than four predetermined simultaneously 
executable instructions. Alternatively, when 
reading out the instructions stored in the first 
instruction buffer 31, a controller described below 
30 may refer to the number of instructions included in 
the VLIW instruction to be processed to determine 
whether it meets the condition of the number of 
predetermined simultaneously executable instructions . 
Only when it is determined that it meets number of 

3 5 predetermined simultaneously executable instructions, 

the instructions are selected and transferred to the 
second instruction buffer 33. 



The second selectiion circuit 34 includes 
four selectors each having four inputs and one 
output (4 to 1 selectors) . The inputs of each 
selector are connected to four unit buffers of the 
second instruction buffer 33 and the output is 
connected to predetermined one of the groups of 
operation parts « That is to say, only the VLIW 
instruction having the sequence of predetermined 
simultaneously executable instructions is stored in 
the second instruction buffer 33* With the 
configuration of Fig. 1, when the length of the 
instruction is four, all arrangements such as "AABC" 
"AABC" and "ABCC" can be processed. Therefore, as 
in the case of the variant of the first embodiment 
described above (Fig. 6) , instructions may be 
arranged in the memory 28 as a sequence of 
predetermined simultaneously executable instructions 
Alternatively, when reading out the instructions 
stored in the first instruction buffer 31, a 
controller described below may refer to the 
information indicating the type of instruction to 
determine whether it is a sequence of predetermined 
simultaneously executable instructions . 

Fig. 8 is a block diagram showing a 
configuration of a parallel processing processor of 
an embodiment of the present invention and a 
computing system including thereof. 

The computing system includes a memory 50, 
an instruction cache memory 51, a data cache memory 
52, and a parallel processing processor described 
below. The parallel processing processor includes 
an instruction buffer 41, an instruction buffer 42, 
a selection circuit 43, operation parts 43 to 49, a 
controller 53, pointers 54 to 56 and a program 
counter 57 . 

The instruction buffer 41 is provided as 
an integrated configuration including the above- 



menliloned first instruction buffer 21 (or 31) and 
the first sel ction circuit 22 (or 32) . The 
instruction buffer 41 has a 4X4 (4-row x 4-column) 
configuration including four rows #0 to #3 and four 
columns #0 to #3. The instruction buffer 41 is 
provided with sixteen (=4x4) unit buffers 241 
arranged in four rows and four columns. Each unit 
buffer 241 includes a portion for storing an 
instruction (INST.), a portion for storing a packing 
flag (P) and a portion for storing flags for 
specifying types of operation (F, B) . If the 
instruction is a floating point operation, the flags 
are set as F=l and B=0 . If the instruction is a 
branch instruction, the flags are set as F=0 and B=l 
If the instruction is an integer operation, the 
flags are set as F=0 and B=0 . Three operations 
identified by a combination of the flags F and B 
correspond to the above-described types of 
instruction A, B and C. 

The instruction buffer 41 includes a 
selector 141. The selector 141 receives four 
instructions read out from the instruction cache 
memory 51 and writes them into a row (block) 
specified by the controller 53. 

The instruction buffer 41 includes four 
selectors having four inputs and one output (4 to 1 
selectors) 341i to 34I4. These selectors 341i to 
34I4 correspond to the above-described first 
selection circuit 22 (32) . The inputs of the 
selector 341i are connected to the outputs of the 
unit buffers 241 in column #0. Similarly, the 
inputs of the selectors 3412 to 34I4 are connected 
to the outputs of the unit buffers 241 in column #1 
to #3, respectively. These selectors 341i to 34I4 
are controlled by the controller 53 . 

The instruction buffer 42 corresponds to 
the above-described second instruction buffer 23 
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(33) and includes four unit buffers 142i to 1424. 
The unit buffers 142i to 1424 rec ive outputs of the 
selectors 341x to 34l4r respectively. 

The selection circuit 43 corresponds to 
5 the above-described second selection circuit 24 (34) 
and includes six selectors each having four inputs 
and one output (4 to 1 selectors) 143i to 1434. The 
inputs of each of the selectors 143i to 1434 are 
connected to the outputs of four unit buffers 142i 
10 to 1424- Each of the selectors 143i to 1434 selects 
one of the four inputs according to the instructions 
of the controller 53 . The outputs of the selectors 
p 143i to 1434 are supplied to the operation parts 44 

to 49, respectively. 
)y 15 The operating parts 44 and 46 are integer 

fg operation parts. The operating parts 45 and 47 are 

J''' floating point operation parts. The operating parts 

#1^ 48 and 49 are branch instruction operation parts. 

£ The operating parts 44 to 47 communicate data to and 

• 2 0 from the cache memory 52, The operation results of 

fy the operation parts 48 and 49 are supplied to the 

P controller 53. 

The controller 53 controls the instruction 
cache memory 51, the instruction buffer 41, the 

2 5 instruction buffer 42 and the selection circuit 43 
in accordance with the flowchart shown in Fig. 9. 
Such a control is carried out using the pointers 54 
to 56 and the program counter 57. The pointer 
(TOPPTR) 54 indicates on which row the starting 

30 address is placed. In other words, the pointer 54 
points to one of the rows #0 to #3, The pointer 
(REAPTR) 55 indicates which of the four instructions 
in the row pointed by the pointer 54 is the starting 
instruction. That is to say, the pointer 55 

3 5 specifies the starting address by the row number and 
the column number. The pointer (WRITEPTR) 56 
indicates the row in which the instruction is to be 



stored in the instruction buffer 41. In other words 
the pointer 56 points one of the row numbers #0 to 
#3. Even is the pointer 56 points to a certain row, 
no more instructions can be stored in the pointed 
row if the instruction buffer 41 is full of 
instructions. The program counter (PC) 57 stores an 
address on the instruction cache memory 51 when 
storing the instruction in the row. That is to say, 
the program counters PC#0 to PC#3 are provided for 
the rows #0 to #3, respectively, and stores an 
address on the instruction cache memory 51 when 
storing the instruction in the row. Note that in 
with the configuration shown in Fig. 8, the 
instructions are written in from the top left 
position of the instruction buffer (row #0, column 
#0) . 

Now, an operation of the computing system 
of Fig. 8 will be described with reference to Fig. 9 
Fig. 9 is a flowchart showing a control operation of 
the controller 53 . 

Firstly, the controller 53 initializes the 
pointers 54, 55 and 56 to set pointer values to 0 
and clears the content of each of the unit buffers 
241 of the instruction buffer 41 (step Sll) . Then, 
the controller 53 retrieves (fetches) a row of 
instructions from the instruction cache memory 51, 
and increments the pointer 56 by one, and sets the 
program counter 57 (step S12) . Then, the controller 
53 determines whether or not to generate a cancel 
request (CANCEL) (step S13) . The cancel request 
clears all instructions stored in the instruction 
buffer 41 when the fetched instruction are disabled 
by mistaken branch predictions and exceptions . When 
the result of determination of step S13 is YES, the 
controller 53 returns to step Sll. When the result 
of determination of step S13 is NO, the controller 
53 proceeds to step S14. 



The controller 53 determines whether it is 
possible to supply (release) a VLIW instruction to 
the instruction buffer 42 (step S14) , In other 
words, the controller 53 determines whether the 
instructions constituting a single VLIW instruction 
are stored in the instruction buffer 141. This 
determination is implemented by referring to the 
packing flag P. When the result of determination of 
step S14 is NO, the controller 53 proceeds to step 
S12. In step S12, four instructions are read into 
the next row and the pointer 56 is incremented by 
one. When the result of determination of step S14 
is YES, the controller 53 proceeds to step S15. 

The controller 53 releases the VLIW 
instruction having packing flags P as boundaries to 
the instruction buffer 42 (step S15) . At the same 
time, the controller 53 increments the pointer 54 by 
one if necessary, and updates the pointer 55. Then, 
the process of the controller 53 proceeds to step 
S16. 

The controller 53 determines whether the 
instruction buffer 41 is full or not (step S16) . If 
it is determined that the instruction buffer 41 is 
not full, the process of the controller 53 returns 
to step S12. If it is determined that the 
instruction buffer 41 is full, the fetching of the 
instructions from the instruction memory 51 by the 
controller 53 is stopped (step S17) . Then, the 
process returns to step S14. 

Now, an example of an operation according 
to the above-described flow chart will be described. 
An explanation is made with an example of storing 
the instructions into the instruction buffer 41 as 
shown in Fig. 10. 

All unit buffers 241 in the instruction 
buffer 41 are reset and the pointers 54 and 56 are 
reset to row #0 (step Sll) . The pointer 55 is also 
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reset: to 0. In this state, four instructions INSTl 
to INST4 of a single row read out from the 
instruction cache memory 51 are fetched into the 
unit buffer 241 of row #0 (step S12) • The pointer 
5 56 is incremented and points to row #1. Step S13 is 
passed with the determination result NO, and the 
packing flag of row #0 is checked at step S14. 
Since the packing flag of the instruction INST 3 at 
position row #0, column #2 is "1", the result of 
10 determination of step 314 is YES. 

In step S15, since the values of the 
pointers 54 and 55 are 0, and the packing flag at a 

C3 position in column #2 is "0", INSTl to INST3 are 

stored in the unit buffers 142i, 1422 r and 1423 of 

fQ 15 the instruction buffer 42. Then, the pointer 55 is 

CO incremented to point to column #3 . The pointer 54 

in 

l"; is not incremented • Then, the process passes step 

Ifi S16 and proceeds to step S122. 

s Note that the controller 53 controls the 

H 

2 0 selection circuit 43 by referring to the values of 
fU flags F and B stored in the unit buffers 142i, 1422, 

and 1423. In this case, since the instruction INSTl 

U 

has flags taking values of F=B=0, the selector 143i 
receives it from the unit buffer 142i and provides 

2 5 it to the operation part 44. Since the instruction 

INST2 also has flags taking values of F=B=0, the 
selector 1433 receives it from the unit buffer 1422 
and provides it to the operation part 46. Since the 
instruction INST3 has flags taking values of F=1,B=0, 

3 0 the selector 1432 receives it from the unit buffer 

1423 and provides it to the operation part 45. 

In step S12, subsequent four instructions 
INST5 to INST8 are read out and stored in four unit 
buffers 241 of row #1 where the pointer 56 is 
3 5 pointing. Then, the process passes through the 
determination result of NO of step S13 and then 
proceeds to step S14. Since the determination 
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result of s1:ep S14 Is YES, the process proceeds -to 
s-tep S15. 

The pointer 54 points to row #0 and the 
pointer 55 points to row #3. Also, the packing flag 
5 with a value "1" is at row #1, column #2. 

Accordingly, instructions INST 4 to INST 7 are read 
out and written into unit buffers 1424, 142i, 1422, 
and 1423 of the instruction buffer 42. Then the 
pointer 54 is incremented to point to row #1. Since 
10 the starting position of the next read out is column 
#3, the pointer 55 remains at value 3. Then, the 
process passes through step S16, and returns to step 
«i S12. 



C3 



Since INST5 stored in the unit buffer 142i 

%□ 

15 has flags taking values of F=B=0, it passes through 

\y 

iQ the selector 143i of the selecting circuit 43 and is 

m outputted to the operation part 44. Since INST6 

m stored in the unit buffer 1422 also has flags taking 

E values of F=B=0, it passes through the selector 1433 

2 0 of the selecting circuit 43 and is outputted to the 
fp operation part 46. Since INST7 stored in the unit 

buffer 1423 has flags taking values of F=l , B=0, it 
passes through the selector 1432 of the selecting 
circuit 43 and is outputted to the operation part 45 

2 5 Since INST4 stored in the unit buffer 1423 has flags 
taking values of F=l, B=0 , it passes through the 
selector 1434 of the selecting circuit 43 and is 
outputted to the operation part 47. 

Accordingly, even in a case where the VLIW 

3 0 instruction extends over a plurality of rows 
(blocks) , instructions can be executed in an 
efficient manner. 

Also, the controller 53 may be provided 
with certain condition such as sequences of allowed 
3 5 instructions and an upper limit of the number of 

instructions and may b designed to have a function 
for determining whether the VLIW instructions stored 



in the instruction buffer 41 meet such conditions. 
In such a case, the controller 53 transfers only the 
VLIW instructions which satisfy the predetermined 
conditions to the instruction buffer 42. 

Further, the present invention is not 
limited to these embodiments, and variations and 
modifications may be made without departing from the 
scope of the present invention. 

The present application is based on 
Japanese priority application No, 2001-120175 filed 
on April 18, 2001, the entire contents of which are 
hereby incorporated by reference. 



